In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')
import dateutil.parser

First, I made a mistake naming the data set! It's 2015 data, not 2014 data. But yes, still use 311-2014.csv. You can rename it.

Importing and preparing your data

Import your data, but only the first 200,000 rows. You'll also want to change the index to be a datetime based on the Created Date column - you'll want to check if it's already a datetime, and parse it if not.


In [216]:
df = pd.read_csv('311-2015.csv', dtype = str)

In [217]:
df.head()


Out[217]:
Unique Key Created Date Closed Date Agency Agency Name Complaint Type Descriptor Location Type Incident Zip Incident Address ... Bridge Highway Name Bridge Highway Direction Road Ramp Bridge Highway Segment Garage Lot Name Ferry Direction Ferry Terminal Name Latitude Longitude Location
0 31015465 07/06/2015 10:58:27 AM 07/22/2015 01:07:20 AM DCA Department of Consumer Affairs Consumer Complaint Demand for Cash NaN 11360 27-16 203 STREET ... NaN NaN NaN NaN NaN NaN NaN 40.773539552542 -73.78823697228408 (40.773539552542, -73.78823697228408)
1 30997660 07/03/2015 01:26:29 PM 07/03/2015 02:08:20 PM NYPD New York City Police Department Vending In Prohibited Area Residential Building/House 10019 200 CENTRAL PARK SOUTH ... NaN NaN NaN NaN NaN NaN NaN 40.76702142171206 -73.97944780718524 (40.76702142171206, -73.97944780718524)
2 31950223 11/09/2015 03:55:09 AM 11/09/2015 08:08:57 AM NYPD New York City Police Department Blocked Driveway No Access Street/Sidewalk 10453 1993 GRAND AVENUE ... NaN NaN NaN NaN NaN NaN NaN 40.85267061877697 -73.91060771362552 (40.85267061877697, -73.91060771362552)
3 31000038 07/03/2015 02:18:32 AM 07/03/2015 07:54:48 AM NYPD New York City Police Department Noise - Commercial Loud Music/Party Club/Bar/Restaurant 11372 84-16 NORTHERN BOULEVARD ... NaN NaN NaN NaN NaN NaN NaN 40.755773786469966 -73.88326243225418 (40.755773786469966, -73.88326243225418)
4 30995614 07/04/2015 12:03:27 AM 07/04/2015 03:33:09 AM NYPD New York City Police Department Noise - Street/Sidewalk Loud Talking Street/Sidewalk 11216 1057 BERGEN STREET ... NaN NaN NaN NaN NaN NaN NaN 40.67617516102934 -73.9512690004692 (40.67617516102934, -73.9512690004692)

5 rows × 53 columns


In [133]:
import datetime

In [134]:
def created_date_to_datetime(date_str):
    return datetime.datetime.strptime(date_str, '%m/%d/%Y %I:%M:%S %p')

In [218]:
df['created_datetime'] = df['Created Date'].apply(created_date_to_datetime)

In [219]:
df = df.set_index('created_datetime')

What was the most popular type of complaint, and how many times was it filed?


In [220]:
freq_complaints = df[['Unique Key', 'Complaint Type']].groupby('Complaint Type').count().sort_values('Unique Key', ascending=False).head()

In [221]:
freq_complaints


Out[221]:
Unique Key
Complaint Type
HEAT/HOT WATER 225706
Noise - Residential 208026
Street Condition 124379
Blocked Driveway 100881
Street Light Condition 98106

Make a horizontal bar graph of the top 5 most frequent complaint types.


In [222]:
ax = freq_complaints.plot(kind = 'barh', legend = False)
ax.set_title('5 Most Frequent 311 Complaints')
ax.set_xlabel('Number of Complaints in 2015')


Out[222]:
<matplotlib.text.Text at 0x11fe9a940>

Which borough has the most complaints per capita? Since it's only 5 boroughs, you can do the math manually.


In [223]:
df[['Unique Key', 'Borough']].groupby('Borough').count().sort_values('Unique Key', ascending = False)


Out[223]:
Unique Key
Borough
BROOKLYN 684215
QUEENS 515349
MANHATTAN 473782
BRONX 410768
STATEN ISLAND 106796
Unspecified 95841

According to your selection of data, how many cases were filed in March? How about May?


In [224]:
cases_in_mar = df[df.index.month == 3]['Unique Key'].count()
print('There were', cases_in_mar, 'cases filed in March.')


There were 206907 cases filed in March.

In [225]:
cases_in_may = df[df.index.month == 5]['Unique Key'].count()
print('There were', cases_in_may, 'cases filed in May.')


There were 187137 cases filed in May.

I'd like to see all of the 311 complaints called in on April 1st.

Surprise! We couldn't do this in class, but it was just a limitation of our data set


In [226]:
april_1_complaints = df[(df.index.month == 4) & (df.index.day == 1)][['Unique Key', 'Created Date', 'Complaint Type', 'Descriptor']]
april_1_complaints


Out[226]:
Unique Key Created Date Complaint Type Descriptor
created_datetime
2015-04-01 21:37:42 30311691 04/01/2015 09:37:42 PM Illegal Parking Blocked Sidewalk
2015-04-01 23:12:04 30307701 04/01/2015 11:12:04 PM Noise - Commercial Loud Music/Party
2015-04-01 13:10:35 30313389 04/01/2015 01:10:35 PM Root/Sewer/Sidewalk Condition Trees and Sidewalks Program
2015-04-01 17:37:38 30314393 04/01/2015 05:37:38 PM Maintenance or Facility Hours of Operation
2015-04-01 12:32:40 30309207 04/01/2015 12:32:40 PM Consumer Complaint Installation/Work Quality
2015-04-01 18:44:50 30311759 04/01/2015 06:44:50 PM Damaged Tree Entire Tree Has Fallen Down
2015-04-01 16:30:15 30309690 04/01/2015 04:30:15 PM Animal Abuse Neglected
2015-04-01 09:04:07 30307990 04/01/2015 09:04:07 AM SCRIE Miscellaneous
2015-04-01 07:46:58 30308253 04/01/2015 07:46:58 AM Blocked Driveway No Access
2015-04-01 17:12:17 30314214 04/01/2015 05:12:17 PM Highway Condition Pothole - Highway
2015-04-01 21:30:48 30307111 04/01/2015 09:30:48 PM Food Establishment Food Temperature
2015-04-01 15:51:04 30311571 04/01/2015 03:51:04 PM Maintenance or Facility Hours of Operation
2015-04-01 10:43:28 30313817 04/01/2015 10:43:28 AM Damaged Tree Branch Cracked and Will Fall
2015-04-01 15:12:46 30308922 04/01/2015 03:12:46 PM Food Establishment Letter Grading
2015-04-01 06:15:42 30311132 04/01/2015 06:15:42 AM Highway Condition Pothole - Highway
2015-04-01 11:28:02 30308180 04/01/2015 11:28:02 AM Highway Condition Pothole - Highway
2015-04-01 17:35:18 30313207 04/01/2015 05:35:18 PM Food Establishment Rodents/Insects/Garbage
2015-04-01 13:54:54 30310017 04/01/2015 01:54:54 PM SCRIE Miscellaneous
2015-04-01 23:49:33 30306774 04/01/2015 11:49:33 PM Noise - Commercial Loud Music/Party
2015-04-01 07:50:49 30313339 04/01/2015 07:50:49 AM Street Condition Rough, Pitted or Cracked Roads
2015-04-01 13:50:29 30312146 04/01/2015 01:50:29 PM Food Establishment Rodents/Insects/Garbage
2015-04-01 16:14:19 30313259 04/01/2015 04:14:19 PM Benefit Card Replacement Medicaid
2015-04-01 19:27:34 30308920 04/01/2015 07:27:34 PM Noise - Street/Sidewalk Loud Music/Party
2015-04-01 05:30:02 30314164 04/01/2015 05:30:02 AM Highway Condition Pothole - Highway
2015-04-01 10:33:26 30311790 04/01/2015 10:33:26 AM Illegal Parking Blocked Sidewalk
2015-04-01 11:47:38 30310940 04/01/2015 11:47:38 AM SCRIE Miscellaneous
2015-04-01 11:01:27 30310409 04/01/2015 11:01:27 AM Consumer Complaint Exchange/Refund/Return
2015-04-01 08:51:52 30310350 04/01/2015 08:51:52 AM Consumer Complaint Cars Parked on Sidewalk/Street
2015-04-01 14:58:55 30313106 04/01/2015 02:58:55 PM SCRIE Rent Discrepancy
2015-04-01 16:59:19 30309324 04/01/2015 04:59:19 PM Blocked Driveway Partial Access
... ... ... ... ...
2015-04-01 10:10:40 30313973 04/01/2015 10:10:40 AM DOF Property - Owner Issue Billing Address Incorrect
2015-04-01 09:45:31 30314014 04/01/2015 09:45:31 AM DOF Property - Reduction Issue Commercial J51 Exemption
2015-04-01 10:39:40 30314031 04/01/2015 10:39:40 AM Noise - Residential Banging/Pounding
2015-04-01 14:02:44 30314032 04/01/2015 02:02:44 PM Noise - Residential Banging/Pounding
2015-04-01 12:57:43 30314033 04/01/2015 12:57:43 PM Noise - Residential Loud Music/Party
2015-04-01 21:13:06 30314044 04/01/2015 09:13:06 PM Noise - Residential Loud Music/Party
2015-04-01 15:55:36 30314045 04/01/2015 03:55:36 PM Noise - Residential Loud Music/Party
2015-04-01 21:19:35 30314046 04/01/2015 09:19:35 PM Noise - Residential Loud Music/Party
2015-04-01 18:56:00 30314047 04/01/2015 06:56:00 PM Noise - Residential Loud Talking
2015-04-01 03:29:02 30314048 04/01/2015 03:29:02 AM Noise - Residential Loud Talking
2015-04-01 03:03:18 30314071 04/01/2015 03:03:18 AM Noise - Residential Loud Music/Party
2015-04-01 21:36:56 30314072 04/01/2015 09:36:56 PM Noise - Residential Loud Television
2015-04-01 16:59:18 30314077 04/01/2015 04:59:18 PM Housing Options Assisted Living
2015-04-01 11:51:47 30314087 04/01/2015 11:51:47 AM DOF Parking - Request Copy List of Outstanding Tickets
2015-04-01 10:04:23 30314088 04/01/2015 10:04:23 AM DOF Property - Owner Issue Billing Address Incorrect
2015-04-01 14:31:19 30314090 04/01/2015 02:31:19 PM DOF Property - Request Copy Copy of Account Information
2015-04-01 13:02:25 30314091 04/01/2015 01:02:25 PM DOF Property - Request Copy Copy of Statement
2015-04-01 12:48:02 30314092 04/01/2015 12:48:02 PM DOF Property - Request Copy Copy of Statement
2015-04-01 09:42:34 30314234 04/01/2015 09:42:34 AM Home Delivered Meal Complaint NaN
2015-04-01 14:54:23 30314238 04/01/2015 02:54:23 PM DOF Parking - Request Copy Decision and Order
2015-04-01 17:41:01 30314261 04/01/2015 05:41:01 PM Housing - Low Income Senior NaN
2015-04-01 15:53:30 30314291 04/01/2015 03:53:30 PM DOF Property - Payment Issue Interest Dispute
2015-04-01 10:31:33 30314440 04/01/2015 10:31:33 AM New Tree Request For One Address
2015-04-01 13:55:04 30314441 04/01/2015 01:55:04 PM New Tree Request For One Address
2015-04-01 10:16:08 30314442 04/01/2015 10:16:08 AM New Tree Request For One Address
2015-04-01 10:24:52 30314499 04/01/2015 10:24:52 AM New Tree Request For One Address
2015-04-01 15:01:35 30314500 04/01/2015 03:01:35 PM New Tree Request For One Address
2015-04-01 15:26:36 30314519 04/01/2015 03:26:36 PM New Tree Request For One Address
2015-04-01 11:28:23 30314540 04/01/2015 11:28:23 AM DOF Property - Payment Issue Other Billing Issue
2015-04-01 10:09:13 30314547 04/01/2015 10:09:13 AM DOF Property - Payment Issue Other Billing Issue

6801 rows × 4 columns

What was the most popular type of complaint on April 1st?


In [227]:
april_1_complaints[['Unique Key', 'Complaint Type']].groupby('Complaint Type').count().sort_values('Unique Key', ascending = False).head(1)


Out[227]:
Unique Key
Complaint Type
Street Condition 839

What were the most popular three types of complaint on April 1st


In [228]:
april_1_complaints[['Unique Key', 'Complaint Type']].groupby('Complaint Type').count().sort_values('Unique Key', ascending = False).head(3)


Out[228]:
Unique Key
Complaint Type
Street Condition 839
HEAT/HOT WATER 581
Noise - Residential 330

What month has the most reports filed? How many? Graph it.


In [229]:
complaints_by_month = df['Unique Key'].groupby(df.index.month).count()
complaints_by_month


Out[229]:
1     190501
2     202520
3     206907
4     182611
5     187137
6     198527
7     194254
8     185412
9     182099
10    197757
11    183091
12    175935
Name: Unique Key, dtype: int64

In [230]:
ax = complaints_by_month.plot()
ax.set_xticks([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
ax.set_xticklabels(['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'])
ax.set_ylabel('Number of Complaints')
ax.set_title('311 Complaints by Month in 2015')

# x_values = df.groupby(df.index.month).median().index
min_values = 0
max_values = complaints_by_month
ax.fill_between(x_values, min_values, max_values, alpha = 0.4)


Out[230]:
<matplotlib.collections.PolyCollection at 0x12cf887f0>

What week of the year has the most reports filed? How many? Graph the weekly complaints.


In [231]:
complaints_by_week = df['Unique Key'].groupby(df.index.week).count()
complaints_by_week


Out[231]:
1     17522
2     54405
3     40467
4     39814
5     43132
6     49582
7     45936
8     55971
9     50964
10    47544
11    50035
12    46601
13    43726
14    41302
15    42120
16    42385
17    44138
18    42740
19    43769
20    42648
21    41321
22    43502
23    44435
24    45616
25    46575
26    46653
27    42723
28    43225
29    44749
30    44508
31    44164
32    43696
33    41367
34    42344
35    41424
36    43196
37    41233
38    42804
39    42059
40    45012
41    41819
42    44769
43    46320
44    43901
45    41293
46    44877
47    45365
48    39159
49    44761
50    41230
51    41442
52    32015
53    24393
Name: Unique Key, dtype: int64

In [232]:
ax = complaints_by_week.plot()
ax.set_xticks(range(1,53))
ax.set_xticklabels(['', '', '', '', '5', 
                    '', '', '', '', '10', 
                    '', '', '', '', '15',
                   '', '', '', '', '20',
                   '', '', '', '', '25',
                   '', '', '', '', '30',
                   '', '', '', '', '35',
                   '', '', '', '', '40',
                   '', '', '', '', '45',
                   '', '', '', '', '50',
                   '', '',])
ax.set_ylabel('Number of Complaints')
ax.set_xlabel('Week of the Year')
ax.set_title('311 Complaints by Week in 2015')


Out[232]:
<matplotlib.text.Text at 0x112372780>

Noise complaints are a big deal. Use .str.contains to select noise complaints, and make an chart of when they show up annually. Then make a chart about when they show up every day (cyclic).


In [233]:
noise_complaints = df[df['Complaint Type'].str.contains('Noise') == True]

In [234]:
noise_complaints_hour = noise_complaints['Unique Key'].groupby(noise_complaints.index.hour).count()

In [235]:
ax = noise_complaints_hour.plot()
ax.set_xticks(range(0,24))
ax.set_xticklabels(['Midnight', '', '', '', '', '', 
                    '6 am', '', '', '', '', '',
                    'Noon', '', '', '', '', '',
                    '6 pm', '', '', '', '', ''])
ax.set_ylabel('Number of Complaints')
ax.set_xlabel('Time of Day')
ax.set_title('Noise Complaints by Time of Day in 2015')


Out[235]:
<matplotlib.text.Text at 0x113f87c88>

Which were the top five days of the year for filing complaints? How many on each of those days? Graph it.


In [236]:
top_complaining_days = df['Unique Key'].resample('D').count().sort_values(ascending = False).head()
top_complaining_days


Out[236]:
created_datetime
2015-01-08    11318
2015-02-16    10434
2015-11-13     9818
2015-10-19     9662
2015-02-20     9438
Name: Unique Key, dtype: int64

In [237]:
ax = top_complaining_days.plot(kind = 'barh')
ax.set_ylabel('Date')
ax.set_xlabel('Number of Complaints')
ax.set_title('The Top 5 Days for 311 Complaints in 2015')


Out[237]:
<matplotlib.text.Text at 0x11503fb70>

In [238]:
complaining_days = df['Unique Key'].resample('D').count()
ax = complaining_days.plot()
ax.set_ylabel('Number of Complaints')
ax.set_xlabel('Day of Year')
ax.set_title('Noise Complaints by Day in 2015')


Out[238]:
<matplotlib.text.Text at 0x115866668>

Interesting—it looks cyclical. Let's see what day of the week is most popular:


In [243]:
df.index[1]


Out[243]:
Timestamp('2015-07-03 13:26:29')

In [254]:
def get_day_of_wk(timestamp):
    return datetime.datetime.strftime(timestamp, '%a')

In [255]:
df['datetime'] = df.index

In [256]:
df['day_of_wk'] = df['datetime'].apply(get_day_of_wk)

In [258]:
complaining_day_of_wk = df[['Unique Key', 'day_of_wk']].groupby('day_of_wk').count()

In [272]:
complaining_day_of_wk['number_of_day'] = [6, 2, 7, 1, 5, 3, 4]

In [275]:
complaining_day_of_wk_sorted = complaining_day_of_wk.sort_values('number_of_day')

In [277]:
complaining_day_of_wk_sorted


Out[277]:
Unique Key number_of_day
day_of_wk
Sun 251452 1
Mon 367697 2
Tue 364313 3
Wed 357175 4
Thu 345647 5
Fri 333317 6
Sat 267150 7

In [282]:
ax = complaining_day_of_wk_sorted.plot(y = 'Unique Key', legend = False)


What hour of the day are the most complaints? Graph a day of complaints.


In [283]:
hourly_complaints = df['Unique Key'].groupby(df.index.hour).count()

In [284]:
ax = hourly_complaints.plot()
ax.set_xticks(range(0,24))
ax.set_xticklabels(['Midnight', '', '', '', '', '', 
                    '6 am', '', '', '', '', '',
                    'Noon', '', '', '', '', '',
                    '6 pm', '', '', '', '', ''])
ax.set_ylabel('Number of Complaints')
ax.set_xlabel('Time of Day')
ax.set_title('311 Complaints by Time of Day in 2015')


Out[284]:
<matplotlib.text.Text at 0x119319390>

One of the hours has an odd number of complaints. What are the most common complaints at that hour, and what are the most common complaints the hour before and after?


In [285]:
# 11 pm
df[df.index.hour == 23][['Unique Key', 'Complaint Type', 'Descriptor']].groupby('Complaint Type').count().sort_values('Unique Key', ascending = False).head()


Out[285]:
Unique Key Descriptor
Complaint Type
Noise - Residential 27576 27576
Noise - Street/Sidewalk 7293 7293
Noise - Commercial 6955 6955
Blocked Driveway 4775 4775
Illegal Parking 4373 4373

In [286]:
# 12 am
df[df.index.hour == 0][['Unique Key', 'Complaint Type', 'Descriptor']].groupby('Complaint Type').count().sort_values('Unique Key', ascending = False).head()


Out[286]:
Unique Key Descriptor
Complaint Type
HEAT/HOT WATER 142330 142330
Rodent 28567 28567
PAINT/PLASTER 28165 28165
UNSANITARY CONDITION 27962 27962
PLUMBING 24137 24137

In [287]:
# 1 am
df[df.index.hour == 1][['Unique Key', 'Complaint Type', 'Descriptor']].groupby('Complaint Type').count().sort_values('Unique Key', ascending = False).head()


Out[287]:
Unique Key Descriptor
Complaint Type
Noise - Residential 16893 16893
Noise - Commercial 4685 4685
Noise - Street/Sidewalk 3819 3819
Noise 2282 2282
Blocked Driveway 2267 2267

So odd. What's the per-minute breakdown of complaints between 12am and 1am? You don't need to include 1am.


In [291]:
midnight_complaints = df[df.index.hour == 0][['Unique Key', 'Complaint Type']]

for minute in range(0, 60):
    top_complaint = midnight_complaints[midnight_complaints.index.minute == minute].groupby('Complaint Type').count().sort_values('Unique Key', ascending = False).head(1)
    if minute < 10:
        minute = '0' + str(minute)
    else:
        minute = str(minute)
    print('12:' + minute + '\'s top complaint was:', top_complaint)
    print('')
# hourly_complaints = df['Unique Key'].groupby(df.index.hour).count()


12:00's top complaint was:                 Unique Key
Complaint Type            
HEAT/HOT WATER      140688

12:01's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         366

12:02's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         372

12:03's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         387

12:04's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         390

12:05's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         441

12:06's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         429

12:07's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         398

12:08's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         415

12:09's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         384

12:10's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         405

12:11's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         411

12:12's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         383

12:13's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         386

12:14's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         379

12:15's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         417

12:16's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         382

12:17's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         403

12:18's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         393

12:19's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         365

12:20's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         379

12:21's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         371

12:22's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         398

12:23's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         383

12:24's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         387

12:25's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         351

12:26's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         384

12:27's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         353

12:28's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         354

12:29's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         411

12:30's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         345

12:31's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         380

12:32's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         412

12:33's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         351

12:34's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         362

12:35's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         403

12:36's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         359

12:37's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         378

12:38's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         374

12:39's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         356

12:40's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         354

12:41's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         339

12:42's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         365

12:43's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         382

12:44's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         341

12:45's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         348

12:46's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         369

12:47's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         317

12:48's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         310

12:49's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         361

12:50's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         326

12:51's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         312

12:52's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         351

12:53's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         356

12:54's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         337

12:55's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         355

12:56's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         305

12:57's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         361

12:58's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         367

12:59's top complaint was:                      Unique Key
Complaint Type                 
Noise - Residential         296

Looks like midnight is a little bit of an outlier. Why might that be? Take the 5 most common agencies and graph the times they file reports at (all day, not just midnight).

5 Top Agencies:


In [292]:
df[['Unique Key', 'Agency']].groupby('Agency').count().sort_values('Unique Key', ascending = False).head()


Out[292]:
Unique Key
Agency
HPD 629352
NYPD 580090
DOT 318886
DEP 185305
DSNY 155217

In [293]:
def agency_hourly_complaints(agency_name_str):
    agency_complaints = df[df['Agency'] == agency_name_str]
    return agency_complaints['Unique Key'].groupby(agency_complaints.index.hour).count()

In [294]:
ax = agency_hourly_complaints('HPD').plot(label = 'HPD', legend = True)
for x in ['NYPD', 'DOT', 'DEP', 'DSNY']:
    agency_hourly_complaints(x).plot(ax = ax, label = x, legend = True)


Graph those same agencies on an annual basis - make it weekly. When do people like to complain? When does the NYPD have an odd number of complaints?


In [295]:
def agency_weekly_complaints(agency_name_str):
    agency_complaints = df[df['Agency'] == agency_name_str]
    return agency_complaints['Unique Key'].groupby(agency_complaints.index.week).count()

In [296]:
ax = agency_weekly_complaints('NYPD').plot(label = 'NYPD', legend = True)
for x in ['DOT', 'HPD', 'DPR', 'DSNY']:
    agency_weekly_complaints(x).plot(ax = ax, label = x, legend = True)



In [299]:
NYPD_complaints = df[df['Agency'] == 'NYPD']
NYPD_weekly_complaints = NYPD_complaints['Unique Key'].groupby(NYPD_complaints.index.week).count()
NYPD_weekly_complaints[NYPD_weekly_complaints > 13000]


Out[299]:
19    13959
22    14404
23    13286
27    13836
35    13048
38    13198
39    13966
Name: Unique Key, dtype: int64

In [300]:
# # a way to use the function agency_weekly_complaints that's actually longer than not using it.
# week_number = 0
# for week in agency_weekly_complaints('NYPD'):
#     week_number += 1
#     if week > 1500:
#         print('In week', week_number)
#         print('there were', week, 'complaints.')
#         print('')

It looks like complaints are most popular in May, June, September—generally in the summer.


In [301]:
NYPD_weekly_complaints[NYPD_weekly_complaints < 6000]


Out[301]:
1     5590
53    5505
Name: Unique Key, dtype: int64

It looks like complaints are least popular in around Christmas and New Year's.

Maybe the NYPD deals with different issues at different times? Check the most popular complaints in July and August vs the month of May. Also check the most common complaints for the Housing Preservation Bureau (HPD) in winter vs. summer.

Most popular NYPD complaints in July and August:


In [302]:
NYPD_complaints = df[df['Agency'] == 'NYPD']
NYPD_jul_aug_complaints = NYPD_complaints[(NYPD_complaints.index.month == 7) | (NYPD_complaints.index.month == 8)][['Unique Key', 'Complaint Type']]
NYPD_jul_aug_complaints.groupby('Complaint Type').count().sort_values('Unique Key', ascending = False).head()


Out[302]:
Unique Key
Complaint Type
Noise - Residential 37509
Illegal Parking 16287
Noise - Street/Sidewalk 15463
Blocked Driveway 15428
Noise - Commercial 5973

Most popular NYPD complaints in May:


In [303]:
NYPD_complaints = df[df['Agency'] == 'NYPD']
NYPD_may_complaints = NYPD_complaints[(NYPD_complaints.index.month == 5)][['Unique Key', 'Complaint Type']]
NYPD_may_complaints.groupby('Complaint Type').count().sort_values('Unique Key', ascending = False).head()


Out[303]:
Unique Key
Complaint Type
Noise - Residential 22923
Illegal Parking 8105
Blocked Driveway 8103
Noise - Street/Sidewalk 7157
Noise - Commercial 4922

Most popular HPD complaints in June, July, and August:


In [304]:
HPD_complaints = df[df['Agency'] == 'HPD']
HPD_jun_jul_aug_complaints = HPD_complaints[(HPD_complaints.index.month == 6) |
                                           (HPD_complaints.index.month == 7) |
                                           (HPD_complaints.index.month == 8)][['Unique Key', 'Complaint Type']]
HPD_jun_jul_aug_complaints.groupby('Complaint Type').count().sort_values('Unique Key', ascending = False).head()


Out[304]:
Unique Key
Complaint Type
UNSANITARY CONDITION 23687
PAINT/PLASTER 17653
PLUMBING 12617
DOOR/WINDOW 9440
WATER LEAK 8679

Most popular HPD complaints in December, January, and February:


In [305]:
HPD_complaints = df[df['Agency'] == 'HPD']
HPD_dec_jan_feb_complaints = HPD_complaints[(HPD_complaints.index.month == 12) |
                                           (HPD_complaints.index.month == 1) |
                                           (HPD_complaints.index.month == 2)][['Unique Key', 'Complaint Type']]
HPD_dec_jan_feb_complaints.groupby('Complaint Type').count().sort_values('Unique Key', ascending = False).head()


Out[305]:
Unique Key
Complaint Type
HEAT/HOT WATER 121055
UNSANITARY CONDITION 18118
PAINT/PLASTER 16632
PLUMBING 15799
DOOR/WINDOW 10801